COMPLECS: High-Throughput and Many-Task Computing - Slurm Edition

Thursday, October 17, 2024

6:00 PM - 7:30 PM UTC

This event will be held remotely.

Not all computational problems utilize the types of parallel applications traditionally designed to run on high-performance computing (HPC) systems. In fact, today, many of the workloads run on these systems often require a modest amount of computing resources for any given job or task. For certain research workloads, however, a more important consideration is actually how much aggregate compute power can be consistently and reliably leveraged against a problem over time. These high-throughput computing (HTC) workloads aim to complete larger problems over longer periods of time by completing many smaller computational subtasks. For example, these often involve large parameter sweeps over simulation input parameters or regular processing and analysis of data collected from specialized instruments. In some cases, these problems will also be composed of many district computational subtasks linked together in highly-structured, complex workflows, which can become in and of themselves a challenge to design and manage effectively. If your research problem can leverage a high-throughput or many-task computing (MTC) model, then it is important to learn how to build and run these types of workflows safely and effectively on HPC systems.

In this third part of our series on Batch Computing, we introduce you to high-throughput and many-task computing using the Slurm Workload Manager. In particular, you will learn how to use Slurm job arrays and job dependencies, which can be used to create these more structured computational workflows. We will also highlight some of the problems you’ll likely come across  when you start running HTC and/or MTC workloads on HPC systems. This will include a discussion on job bundling strategies — what they are and when to use them. Additional topics about high-throughput and many-task computing workflows will be covered as time permits.

---
COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.

Instructor

Marty Kandes

Computational and Data Science Research Specialist, SDSC

Marty Kandes a Computational and Data Science Research Specialist in the High-Performance Computing User Services Group at SDSC. He currently helps manage user support for Comet — SDSC’s largest supercomputer. Marty obtained his Ph.D. in Computational Science in 2015 from the Computational Science Research Center at San Diego State University, where his research focused on studying quantum systems in rotating frames of reference through the use of numerical simulation. He also holds an M.S. in Physics from San Diego State University and B.S. degrees in both Applied Mathematics and Physics from the University of Michigan, Ann Arbor. His current research interests include problems in Bayesian statistics, combinatorial optimization, nonlinear dynamical systems, and numerical partial differential equations.